Multi-level Disambiguation Grammar Inferred from English Corpus, Treebank, and Dictionary

نویسندگان

  • Eric Atwell
  • Simon Arnfield
  • George Demetriou
  • Steve Hanlon
  • John Hughes
  • Uwe Jost
  • Rob Pocock
  • Clive Souter
  • Joerg Ueberla
چکیده

In this paper we will show that Grammatical Inference is applicable to Natural Language Processing. Given the wide and complex range of structures appearing in an unrestricted Natural Language like English, full Grammatical Inference, yielding a comprehensive syntactic and semantic definition of English, is too much to hope for at present. Instead, we focus on techniques for dealing with ambiguity resolution by probabilistic ranking; this does not require a full formal Chomskyan grammar. We giv e a short overview of the different levels and methods being investigated at CCALAS for probabilistic ranking of candidates in ambiguous English input.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Disambiguating Compound Nouns for a Dynamic HPSG Treebank of Wall Street Journal Texts

The aim of this paper is twofold. We focus, on the one hand, on the task of dynamically annotating English compound nouns, and on the other hand we propose disambiguation methods and techniques which facilitate the annotation task. Both the aforementioned are part of a larger on-going effort which aims to create HPSG annotation for the texts from the Wall Street Journal (henceforward WSJ) secti...

متن کامل

Sejong Korean Corpora in the Making

The 21st Century Sejong Project is a comprehensive project aiming to build various kinds of language resources including Korean corpora, comparable to BNC (Aston & Burnard, 1998), and Korean electronic dictionaries. The project was conceived of in 1997 and started in 1998 as a 10-year long-term project. By 2003, we completed 6 years of our work. The Sejong Corpora are a collection of raw corpor...

متن کامل

Towards an LFG parser for Polish: An exercise in parasitic grammar development

While it is possible to build a formal grammar manually from scratch or, going to another extreme, to derive it automatically from a treebank, the development of the LFG grammar of Polish presented in this paper is different from both of these methods as it relies on extensive reuse of existing language resources for Polish. LFG grammars minimally provide two levels of representation: constitue...

متن کامل

A new semantically annotated corpus with syntactic-semantic and cross-lingual senses

In this article, we describe a new sense-tagged corpus for Word Sense Disambiguation. The corpus is constituted of instances of 20 French polysemous verbs. Each verb instance is annotated with three sense labels: (1) the actual translation of the verb in the english version of this instance in a parallel corpus, (2) an entry of the verb in a computational dictionary of French (the Lexicon-Gramm...

متن کامل

The Interplay Between Lexical and Syntactic Resources in Incremental Parsebanking

Automatic syntactic analysis of a corpus requires detailed lexical and morphological information that cannot always be harvested from traditional dictionaries. In building the INESS Norwegian treebank, it is often the case that necessary lexical information is missing in the morphology or lexicon. The approach used to build the treebank is incremental parsebanking; a corpus is parsed with an ex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007